As an ordinary living in Melbourne. I want to avoid the difficulty of parking in the city.
- Where is it difficult to park now?
- Is it peak parking time now?
At the end of this use case you will:
- Get to know the Melbourne city center parking map
- Understand the current parking hotspots in Melbourne city center
- Learn about multi-file data filtering
- Deepen understanding of snapshot data
- Understand the necessity of data processing
- This notebook performs exploratory data analysis of Melbourne parking sensor data. Try combining Melbourne parking restriction data with Melbourne parking data. With exploratory analysis, attempts, and assumptions, it's not a rigorous conclusion.
- Due to the snapshot nature of sensor data, the data within the notebook is a historical record of the time the notebook is running. This notebook can only be used as a reference and tutorial and does not have the function of real-time data. Please download and run if necessary. Limited by sensor failures and update issues. Data accuracy is not guaranteed.
Get and process data¶
# from google.colab import drive
import os
import requests
import json
import pytz
import folium
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
from datetime import datetime, timezone
from pandas.tseries.offsets import Minute
from sklearn.cluster import DBSCAN
Fetch Data from Open Data Melbourne¶
# drive.mount('/content/gdrive')
# with open('/content/gdrive/My Drive/Colab Notebooks/apiKey.txt', 'r') as file:
# api_key = file.read().strip()
# api_key = os.getenv(api_key)
# export endpoint
# https://data.melbourne.vic.gov.au/explore/dataset/on-street-parking-bay-sensors/information/?dataChart=eyJxdWVyaWVzIjpbeyJjaGFydHMiOlt7InR5cGUiOiJjb2x1bW4iLCJmdW5jIjoiQ09VTlQiLCJ5QXhpcyI6ImJheV9pZCIsInNjaWVudGlmaWNEaXNwbGF5Ijp0cnVlLCJjb2xvciI6IiNFNTBFNTYifV0sInhBeGlzIjoibGFzdHVwZGF0ZWQiLCJtYXhwb2ludHMiOjUwLCJzb3J0IjoiIiwidGltZXNjYWxlIjoieWVhciIsImNvbmZpZyI6eyJkYXRhc2V0Ijoib24tc3RyZWV0LXBhcmtpbmctYmF5LXNlbnNvcnMiLCJvcHRpb25zIjp7fX19XSwidGltZXNjYWxlIjoiIiwiZGlzcGxheUxlZ2VuZCI6dHJ1ZSwiYWxpZ25Nb250aCI6dHJ1ZX0%3D
dataset_id = 'on-street-parking-bay-sensors'
base_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = api_key
dataset_id = dataset_id
format = 'csv'
url = f'{base_URL}{dataset_id}/exports/{format}'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
# 'api_key': apikey
}
# GET request
response = requests.get(url, params = params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
df = pd.read_csv(StringIO(url_content), delimiter = ';')
print(df.sample(2, random_state = 999)) # test
else:
print(f'Request failed with status code {response.status_code}')
lastupdated status_timestamp zone_number \
2278 2024-05-11T03:30:26+00:00 2024-05-11T03:06:09+00:00 7740.0
1566 2024-04-30T04:44:26+00:00 2024-04-15T00:56:50+00:00 7833.0
status_description kerbsideid location
2278 Present 10742 -37.80379669737577, 144.95088211182127
1566 Present 13610 -37.829771231755444, 144.96716983918282
# check length in dataset
print(f'The dataset contains {len(df)} records.')
The dataset contains 6246 records.
Data preprocessing¶
Separate latitude and longitude in data
print(df.dtypes)
lastupdated object status_timestamp object zone_number float64 status_description object kerbsideid int64 location object dtype: object
# Split the location string into latitude and longitude, then assign to new columns
df['latitude'] = df['location'].apply(lambda x: float(x.split(',')[0].strip()))
df['longitude'] = df['location'].apply(lambda x: float(x.split(',')[1].strip()))
# Drop the original location column
df.drop(columns = ['location'], inplace=True)
print(df.head(3))
lastupdated status_timestamp zone_number \ 0 2023-10-25T01:44:02+00:00 2023-10-25T01:07:05+00:00 7556.0 1 2023-10-25T01:44:02+00:00 2023-10-25T00:57:43+00:00 7556.0 2 2023-10-25T01:44:02+00:00 2023-10-25T01:12:46+00:00 7556.0 status_description kerbsideid latitude longitude 0 Present 5730 -37.810582 144.968053 1 Present 5728 -37.810566 144.968109 2 Present 5750 -37.810558 144.968137
Check null value and data type
def validate_data(df):
print("\nChecking for NaN values:")
print(df.isna().sum())
print("\nChecking for Duplicates:")
print(df.duplicated().sum())
print("\nChecking data types:")
print(df.dtypes)
validate_data(df)
Checking for NaN values: lastupdated 0 status_timestamp 0 zone_number 487 status_description 0 kerbsideid 0 latitude 0 longitude 0 dtype: int64 Checking for Duplicates: 0 Checking data types: lastupdated object status_timestamp object zone_number float64 status_description object kerbsideid int64 latitude float64 longitude float64 dtype: object
Since the zone number does not affect the geographical location coordinates and sensor ID, it is not processed. Check and eliminate abnormal time records below. Delete records older than one year.
# Define the function to convert columns to datetime
def convert_to_datetime(df, column_name):
try:
df[column_name] = pd.to_datetime(df[column_name], errors = 'coerce', format = "%Y-%m-%dT%H:%M:%S+00:00")
except ValueError as e:
print(f"Error converting {column_name}: {e}")
# Create a new DataFrame for analysis based on the original df
df_analysis = df.copy()
# Apply the datetime conversion function
convert_to_datetime(df_analysis, 'lastupdated')
convert_to_datetime(df_analysis, 'status_timestamp')
# Check for NaT values in datetime columns
nat_counts = df_analysis[['lastupdated', 'status_timestamp']].isna().sum()
print("NaT values after conversion:")
print(nat_counts)
NaT values after conversion: lastupdated 0 status_timestamp 0 dtype: int64
# Identify the most recent timestamp
latest_timestamp = df_analysis['status_timestamp'].max()
# Calculate the date one year ago from the most recent timestamp
one_year_ago = latest_timestamp - pd.DateOffset(years=1)
# Filter and process as per your previous logic
outdated_rows = df_analysis[df_analysis['status_timestamp'] < one_year_ago].copy()
print("Outdated rows:")
print(outdated_rows.sort_values(by = 'status_timestamp').head())
# Print the data header sorted by 'status_timestamp'
print("\nData header sorted by time:")
print(df_analysis.sort_values(by = 'status_timestamp').head())
Outdated rows:
lastupdated status_timestamp zone_number status_description \
477 2023-12-11 23:45:34 1970-01-01 7498.0 Unoccupied
67 2023-11-22 00:44:42 1970-01-01 NaN Unoccupied
114 2023-11-22 00:44:42 1970-01-01 7531.0 Unoccupied
110 2023-11-22 00:44:42 1970-01-01 7531.0 Unoccupied
664 2023-11-22 00:44:42 1970-01-01 NaN Unoccupied
kerbsideid latitude longitude
477 7740 -37.812619 144.953622
67 8040 -37.818000 144.954134
114 8033 -37.817782 144.954895
110 8027 -37.817671 144.955279
664 6246 -37.817832 144.958757
Data header sorted by time:
lastupdated status_timestamp zone_number status_description \
664 2023-11-22 00:44:42 1970-01-01 NaN Unoccupied
110 2023-11-22 00:44:42 1970-01-01 7531.0 Unoccupied
114 2023-11-22 00:44:42 1970-01-01 7531.0 Unoccupied
67 2023-11-22 00:44:42 1970-01-01 NaN Unoccupied
477 2023-12-11 23:45:34 1970-01-01 7498.0 Unoccupied
kerbsideid latitude longitude
664 6246 -37.817832 144.958757
110 8027 -37.817671 144.955279
114 8033 -37.817782 144.954895
67 8040 -37.818000 144.954134
477 7740 -37.812619 144.953622
# Delete outdated rows
df_analysis = df_analysis[df_analysis['status_timestamp'] >= one_year_ago]
df_analysis.shape
(6154, 7)
Based on geographical location coordinates and parking space status and id. Check for duplicate information and eliminate it. Since the retrieval adopts a circular application, duplicate data may be obtained. This step cannot be omitted.
# Detect duplicates based on all columns
duplicates = df_analysis.duplicated(
subset = ['kerbsideid', 'status_description', 'latitude', 'longitude'],
keep = False)
# Display duplicates
print("Duplicate Rows based on all columns are:")
print(df_analysis[duplicates].sort_values('kerbsideid'))
Duplicate Rows based on all columns are: Empty DataFrame Columns: [lastupdated, status_timestamp, zone_number, status_description, kerbsideid, latitude, longitude] Index: []
# Remove duplicates based on specified columns in df_analysis
df_analysis.drop_duplicates(
subset = ['kerbsideid', 'status_description',
'latitude', 'longitude', 'lastupdated'],
keep = 'first', inplace = True)
Data analysis¶
Processing¶
Get the total number of data
# Drop duplicate rows based on 'latitude' and 'longitude'
unique_locations = df_analysis.drop_duplicates(subset = ['latitude', 'longitude'])
# Count the number of unique parking spaces based on location
total_unique_parking_spaces = len(unique_locations)
print(f"The total number of unique parking spaces based on location is {total_unique_parking_spaces}.")
The total number of unique parking spaces based on location is 5800.
Get parking spaces with non-single parking status, based on geographical location.
# Group by latitude and longitude and count occurrences in df_analysis
location_counts = df_analysis.groupby(
['latitude', 'longitude']).size().reset_index(name = 'Occurrences')
# Identify locations that appear more than once in df_analysis
duplicate_locations = location_counts[location_counts['Occurrences'] > 1]
# Print out the duplicate locations in df_analysis
if duplicate_locations.empty:
print("Each geographic location occurs only once.")
else:
print("Duplicate geographic locations found:")
print(duplicate_locations)
Duplicate geographic locations found:
latitude longitude Occurrences
1031 -37.819612 144.954532 2
1046 -37.819562 144.954509 2
1058 -37.819512 144.954486 2
1066 -37.819463 144.954464 2
1133 -37.819000 144.958300 2
... ... ... ...
3862 -37.809577 144.971496 2
3868 -37.809559 144.971561 2
3869 -37.809551 144.971585 2
3871 -37.809544 144.971611 2
3874 -37.809536 144.971637 2
[353 rows x 3 columns]
Get geolocation with multiple kerbsideid
# Filter rows in df_analysis where geographic locations are duplicated
duplicate_locations = df_analysis[
df_analysis.duplicated(subset = ['latitude', 'longitude'], keep = False)]
# Sort these rows in df_analysis by latitude, longitude, and kerbsideid for easier inspection
sorted_duplicates = duplicate_locations.sort_values(
by = ['latitude', 'longitude', 'kerbsideid'])
# Print kerbsideid for duplicate locations in df_analysis
print("Kerbside id for duplicate locations:")
print(sorted_duplicates[['latitude', 'longitude', 'kerbsideid']])
Kerbside id for duplicate locations:
latitude longitude kerbsideid
149 -37.819612 144.954532 8057
3578 -37.819612 144.954532 57924
595 -37.819562 144.954509 8056
2062 -37.819562 144.954509 57925
176 -37.819512 144.954486 8055
... ... ... ...
3476 -37.809551 144.971585 65038
219 -37.809544 144.971611 6086
3477 -37.809544 144.971611 65037
35 -37.809536 144.971637 6087
3478 -37.809536 144.971637 65036
[707 rows x 3 columns]
Check if there are duplicate kerbsideid
# df_analysis is DataFrame for analysis
duplicate_kerbsideid = df_analysis[df_analysis['kerbsideid'].duplicated(keep = False)]
# Sort the data in df_analysis for easier inspection
duplicate_kerbsideids = duplicate_kerbsideid.sort_values('kerbsideid')
if len(duplicate_kerbsideid) > 0:
print("Duplicate kerbsideid found in df_analysis:")
print(duplicate_kerbsideid[['kerbsideid', 'latitude', 'longitude']])
else:
print("No duplicate kerbsideid found in df_analysis.")
No duplicate kerbsideid found in df_analysis.
Explore the earliest time each parking space is empty
# Filter for 'Unoccupied' status in df_analysis
unoccupied_spaces = df_analysis[df_analysis['status_description'] == 'Unoccupied']
# Sort by 'status_timestamp' in df_analysis
sorted_unoccupied = unoccupied_spaces.sort_values(by = 'status_timestamp')
# Find the earliest time for each unique parking space based on latitude and longitude in df_analysis
earliest_unoccupied = sorted_unoccupied.drop_duplicates(
subset = ['latitude', 'longitude'], keep = 'first')
# Display the result
print("The earliest time each parking space was detected as empty in df_analysis:")
print(earliest_unoccupied[['latitude', 'longitude', 'status_timestamp']])
The earliest time each parking space was detected as empty in df_analysis:
latitude longitude status_timestamp
5257 -37.806193 144.952080 2023-05-27 16:40:41
3822 -37.806147 144.952089 2023-06-04 06:46:46
5256 -37.806170 144.952085 2023-06-05 07:35:13
3819 -37.806027 144.952110 2023-06-07 00:41:04
2303 -37.805884 144.952136 2023-06-07 01:13:53
... ... ... ...
4567 -37.812266 144.970216 2024-05-11 03:28:40
2001 -37.814475 144.974604 2024-05-11 03:28:41
6169 -37.815803 144.972748 2024-05-11 03:28:43
6238 -37.814150 144.960434 2024-05-11 03:28:43
3168 -37.813994 144.977744 2024-05-11 03:28:51
[1938 rows x 3 columns]
Check whether the occupied parking space has previously reported a vacant status.
# Convert to datetime format
df_analysis['lastupdated'] = pd.to_datetime(df_analysis['lastupdated'])
df_analysis['status_timestamp'] = pd.to_datetime(df_analysis['status_timestamp'])
# Sort and filter
sorted_df = df_analysis.sort_values(by = ['latitude', 'longitude', 'status_timestamp'])
current_vacant = sorted_df[sorted_df['status_description'] == 'Unoccupied'].copy()
# Initialize result DataFrame
result = pd.DataFrame(columns = ['latitude', 'longitude', 'last_occupied_time'])
# Loop through each vacant space to check its history
for idx, row in current_vacant.iterrows():
lat, lon, current_time = row['latitude'], row['longitude'], row['status_timestamp']
time_threshold = current_time - pd.Timedelta(hours = 3)
previous_records = sorted_df[
(sorted_df['latitude'] == lat) &
(sorted_df['longitude'] == lon) &
(sorted_df['status_timestamp'] >= time_threshold) &
(sorted_df['status_description'] == 'Present')]
if not previous_records.empty:
last_occupied_time = previous_records['status_timestamp'].max()
new_row = pd.DataFrame(
{'latitude': [lat], 'longitude': [lon],
'last_occupied_time': [last_occupied_time]})
result = pd.concat([result, new_row], ignore_index = True)
# Display the result
print("Parking spaces that were occupied a few hours ago and are now vacant:")
print(result)
Parking spaces that were occupied a few hours ago and are now vacant:
latitude longitude last_occupied_time
0 -37.818448 144.959387 2024-03-13 03:03:31
1 -37.818156 144.957058 2024-05-11 01:52:59
2 -37.818018 144.954073 2024-05-10 20:33:28
3 -37.817982 144.954198 2024-04-03 22:49:13
4 -37.817890 144.958554 2024-05-11 02:09:09
.. ... ... ...
60 -37.809800 144.970405 2024-05-11 02:40:45
61 -37.809787 144.972304 2024-05-11 03:06:17
62 -37.809599 144.971422 2024-05-11 03:06:00
63 -37.809577 144.971496 2024-05-11 02:18:32
64 -37.809544 144.971611 2024-05-11 00:56:13
[65 rows x 3 columns]
Analyze results¶
It can be found that very few sensors report both statuses. At the same time, some sensors have two parking space ids. Each parking space ID has only one status, and there are no duplicate IDs. It may be due to the nature of the sensor. Because the geographical location is the sensor location, sometimes one sensor can manage two parking spaces.
It can be determined that this data set is snapshot attribute data. When a sensor reports a state change, the old state is deleted from the data. Also, referring to the official documentation, each sensor will be tested every morning. Today's data is the only reliable data. Other sensors cannot be used due to construction closures and other reasons.
Data processing¶
Convert time zone from UTC to Melbourne time
# Create a timezone object for Melbourne
melbourne_tz = pytz.timezone('Australia/Melbourne')
# Get the current time in UTC
current_time_utc = datetime.now(pytz.utc)
# Convert to Melbourne local time
current_time_melbourne = current_time_utc.astimezone(melbourne_tz)
print(f"Current time in UTC: {current_time_utc}")
print(f"Current time in Melbourne: {current_time_melbourne}")
Current time in UTC: 2024-05-11 03:40:22.985981+00:00 Current time in Melbourne: 2024-05-11 13:40:22.985981+10:00
# Check if 'status_timestamp' is timezone-naive
if df_analysis['status_timestamp'].dt.tz is None:
# Localize naive datetime objects to UTC
df_analysis['status_timestamp'] = df_analysis['status_timestamp'].dt.tz_localize('UTC')
else:
# If already timezone-aware, ensure it's in UTC
df_analysis['status_timestamp'] = df_analysis['status_timestamp'].dt.tz_convert('UTC')
# Convert 'status_timestamp' from UTC to Melbourne local time
df_analysis['status_timestamp'] = df_analysis['status_timestamp'].dt.tz_convert('Australia/Melbourne')
# Display the first few 'status_timestamp' to check
print(df_analysis['status_timestamp'].head())
0 2023-10-25 12:07:05+11:00 1 2023-10-25 11:57:43+11:00 2 2023-10-25 12:12:46+11:00 3 2023-10-25 06:31:04+11:00 4 2023-10-25 10:56:41+11:00 Name: status_timestamp, dtype: datetime64[ns, Australia/Melbourne]
Get today’s data volume
print("Unique dates in the dataset:", df_analysis['status_timestamp'].dt.date.unique())
# Get today's date from the 'current_time_melbourne' which is already timezone-aware
today_melbourne = current_time_melbourne.date()
print("Today's date in Melbourne:", today_melbourne)
# Convert 'status_timestamp' to date only for filtering
df_analysis['status_date'] = df_analysis['status_timestamp'].dt.date
# Filter for today's data
todays_data = df_analysis[df_analysis['status_date'] == today_melbourne]
# Count the number of rows in today's data
num_rows_today = len(todays_data)
print(f"There are {num_rows_today} rows in today's data.")
Unique dates in the dataset: [datetime.date(2023, 10, 25) datetime.date(2023, 10, 18) datetime.date(2023, 10, 24) datetime.date(2023, 9, 7) datetime.date(2023, 8, 12) datetime.date(2023, 11, 16) datetime.date(2023, 11, 22) datetime.date(2023, 6, 20) datetime.date(2023, 11, 17) datetime.date(2023, 11, 20) datetime.date(2023, 11, 21) datetime.date(2023, 11, 13) datetime.date(2023, 11, 11) datetime.date(2023, 11, 14) datetime.date(2023, 11, 9) datetime.date(2023, 11, 8) datetime.date(2023, 11, 15) datetime.date(2023, 10, 27) datetime.date(2023, 12, 5) datetime.date(2023, 11, 3) datetime.date(2023, 11, 1) datetime.date(2023, 11, 26) datetime.date(2023, 12, 4) datetime.date(2023, 9, 22) datetime.date(2023, 7, 5) datetime.date(2023, 11, 6) datetime.date(2023, 9, 14) datetime.date(2023, 10, 16) datetime.date(2023, 8, 16) datetime.date(2023, 8, 14) datetime.date(2023, 8, 8) datetime.date(2023, 10, 19) datetime.date(2023, 10, 11) datetime.date(2023, 8, 4) datetime.date(2023, 9, 6) datetime.date(2023, 8, 11) datetime.date(2023, 9, 27) datetime.date(2023, 7, 4) datetime.date(2023, 10, 17) datetime.date(2024, 3, 15) datetime.date(2023, 8, 17) datetime.date(2023, 12, 14) datetime.date(2023, 12, 13) datetime.date(2023, 12, 15) datetime.date(2023, 6, 12) datetime.date(2023, 12, 6) datetime.date(2023, 12, 7) datetime.date(2023, 10, 26) datetime.date(2023, 10, 4) datetime.date(2023, 8, 2) datetime.date(2023, 11, 12) datetime.date(2023, 9, 28) datetime.date(2023, 8, 5) datetime.date(2023, 10, 30) datetime.date(2023, 11, 29) datetime.date(2023, 12, 8) datetime.date(2023, 11, 30) datetime.date(2023, 11, 28) datetime.date(2023, 12, 18) datetime.date(2023, 11, 2) datetime.date(2023, 8, 29) datetime.date(2023, 10, 3) datetime.date(2023, 8, 15) datetime.date(2023, 12, 2) datetime.date(2023, 10, 15) datetime.date(2023, 10, 28) datetime.date(2023, 9, 1) datetime.date(2023, 10, 31) datetime.date(2023, 12, 12) datetime.date(2023, 10, 5) datetime.date(2023, 12, 20) datetime.date(2023, 12, 17) datetime.date(2023, 10, 6) datetime.date(2024, 1, 20) datetime.date(2024, 1, 13) datetime.date(2023, 12, 28) datetime.date(2024, 1, 22) datetime.date(2024, 1, 18) datetime.date(2024, 1, 19) datetime.date(2024, 1, 21) datetime.date(2024, 1, 23) datetime.date(2024, 1, 17) datetime.date(2023, 9, 24) datetime.date(2024, 1, 31) datetime.date(2024, 1, 30) datetime.date(2024, 2, 7) datetime.date(2024, 2, 8) datetime.date(2024, 2, 5) datetime.date(2024, 1, 27) datetime.date(2024, 1, 29) datetime.date(2024, 1, 2) datetime.date(2024, 2, 6) datetime.date(2024, 2, 14) datetime.date(2024, 2, 15) datetime.date(2024, 2, 16) datetime.date(2024, 2, 4) datetime.date(2024, 2, 22) datetime.date(2024, 2, 19) datetime.date(2024, 2, 18) datetime.date(2023, 8, 28) datetime.date(2024, 2, 29) datetime.date(2024, 1, 12) datetime.date(2024, 1, 6) datetime.date(2024, 3, 2) datetime.date(2023, 12, 16) datetime.date(2024, 2, 21) datetime.date(2024, 1, 24) datetime.date(2024, 2, 26) datetime.date(2024, 3, 4) datetime.date(2024, 3, 6) datetime.date(2023, 12, 22) datetime.date(2024, 1, 28) datetime.date(2023, 12, 31) datetime.date(2024, 2, 20) datetime.date(2024, 3, 16) datetime.date(2024, 3, 13) datetime.date(2024, 3, 14) datetime.date(2024, 3, 25) datetime.date(2024, 3, 17) datetime.date(2024, 3, 18) datetime.date(2024, 4, 5) datetime.date(2024, 4, 4) datetime.date(2024, 3, 8) datetime.date(2024, 3, 21) datetime.date(2024, 3, 22) datetime.date(2024, 4, 11) datetime.date(2024, 4, 10) datetime.date(2024, 3, 19) datetime.date(2024, 2, 9) datetime.date(2023, 12, 9) datetime.date(2024, 3, 1) datetime.date(2024, 1, 26) datetime.date(2024, 4, 8) datetime.date(2024, 2, 23) datetime.date(2024, 2, 12) datetime.date(2024, 4, 9) datetime.date(2024, 4, 1) datetime.date(2024, 3, 23) datetime.date(2024, 4, 30) datetime.date(2024, 4, 29) datetime.date(2024, 4, 24) datetime.date(2024, 4, 13) datetime.date(2024, 4, 17) datetime.date(2024, 4, 16) datetime.date(2023, 6, 13) datetime.date(2024, 4, 15) datetime.date(2024, 4, 12) datetime.date(2024, 4, 19) datetime.date(2024, 4, 18) datetime.date(2024, 4, 21) datetime.date(2024, 4, 7) datetime.date(2024, 4, 22) datetime.date(2024, 4, 23) datetime.date(2024, 5, 2) datetime.date(2024, 5, 3) datetime.date(2024, 5, 6) datetime.date(2024, 3, 27) datetime.date(2024, 3, 31) datetime.date(2024, 4, 2) datetime.date(2024, 5, 11) datetime.date(2024, 5, 9) datetime.date(2024, 5, 8) datetime.date(2024, 5, 10) datetime.date(2023, 12, 1) datetime.date(2024, 5, 7) datetime.date(2024, 5, 5) datetime.date(2023, 11, 23) datetime.date(2024, 2, 17) datetime.date(2023, 9, 13) datetime.date(2024, 3, 26) datetime.date(2023, 8, 23) datetime.date(2023, 6, 8) datetime.date(2023, 6, 7) datetime.date(2024, 2, 25) datetime.date(2024, 4, 28) datetime.date(2023, 8, 24) datetime.date(2023, 11, 27) datetime.date(2024, 3, 5) datetime.date(2024, 5, 1) datetime.date(2024, 3, 20) datetime.date(2024, 4, 27) datetime.date(2024, 5, 4) datetime.date(2024, 2, 2) datetime.date(2023, 9, 25) datetime.date(2024, 4, 3) datetime.date(2023, 10, 10) datetime.date(2023, 8, 27) datetime.date(2023, 6, 4) datetime.date(2024, 3, 7) datetime.date(2024, 3, 28) datetime.date(2023, 10, 8) datetime.date(2024, 4, 20) datetime.date(2024, 2, 1) datetime.date(2024, 3, 10) datetime.date(2024, 4, 26) datetime.date(2023, 6, 5) datetime.date(2023, 5, 28) datetime.date(2023, 8, 22) datetime.date(2023, 8, 21) datetime.date(2024, 1, 15) datetime.date(2024, 3, 29)] Today's date in Melbourne: 2024-05-11 There are 3769 rows in today's data.
Get today's data time range
if not todays_data.empty:
min_timestamp_today = todays_data['status_timestamp'].min()
max_timestamp_today = todays_data['status_timestamp'].max()
print(f"The data for today ranges from {min_timestamp_today} to {max_timestamp_today}.")
else:
print("No data available for today's date to determine the time range.")
The data for today ranges from 2024-05-11 00:02:02+10:00 to 2024-05-11 13:28:51+10:00.
Check if there are parking spaces with multiple statuses today
if todays_data.empty:
print("No data available for today's analysis.")
else:
# Group by unique identifier for each parking space and count unique statuses
status_counts = todays_data.groupby(
['latitude', 'longitude'])['status_description'].nunique().reset_index(
name = 'Unique_Status_Count')
# Filter for parking spaces with more than one unique status
multiple_statuses = status_counts[status_counts['Unique_Status_Count'] > 1]
# Display the result
if multiple_statuses.empty:
print("No parking spaces with multiple statuses found for today.")
else:
print("Parking spaces with multiple statuses found for today:")
print(multiple_statuses)
No parking spaces with multiple statuses found for today.
Today's data analysis & Visualization¶
Sensor visualization of all parking spaces in Melbourne city centre.
# Initialize the map centered around a latitude and longitude
m = folium.Map(location = [-37.814, 144.963], zoom_start = 16) # 16 is the zoom level
# Add points to the map using df_analysis
for idx, row in todays_data.iterrows():
color = "blue" if row['status_description'] == "Unoccupied" else "red"
popup_text = f"Status: {row['status_description']}<br>Updated: {row['status_timestamp']}"
folium.CircleMarker(
location = [row['latitude'], row['longitude']],
radius = 1,
color = color,
fill = True,
fill_color = color,
popup = popup_text
).add_to(m)
# Show the map
m
Analyze data time distribution
if todays_data.empty:
print("No data available for today. Cannot perform update time analysis.")
else:
# Extract the hour from the 'status_timestamp' and create a new column 'update_hour'
todays_data['update_hour'] = todays_data['status_timestamp'].dt.hour
# Create a frequency table for the update times
frequency_table = todays_data['update_hour'].value_counts().sort_index()
print("Frequency table for update times:")
print(frequency_table)
# If there are no update hours, inform the user
if frequency_table.empty:
print("No updates have been recorded today.")
else:
# Plot a histogram for the update times
plt.figure(figsize = (10, 6))
plt.hist(todays_data['update_hour'], bins = 24, range = (0, 23), edgecolor = 'black', alpha = 0.7)
plt.xticks(range(24))
plt.xlabel('Hour of the Day')
plt.ylabel('Frequency')
plt.title('Distribution of Update Times Today')
plt.grid(True)
plt.show()
<ipython-input-35-adf6f6a977c7>:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy todays_data['update_hour'] = todays_data['status_timestamp'].dt.hour
Frequency table for update times: update_hour 0 12 1 10 2 5 3 11 4 6 5 11 6 58 7 72 8 51 9 119 10 253 11 572 12 1347 13 1242 Name: count, dtype: int64
Split into vacant and occupied status to visualize time distribution
# ensure 'todays_data' contains data
if todays_data.empty:
print("No data available for today. Cannot perform update time analysis.")
else:
# Filter data for 'Present' and 'Unoccupied' states
present_data = todays_data[todays_data['status_description'] == 'Present']
unoccupied_data = todays_data[todays_data['status_description'] == 'Unoccupied']
# Create subplots
fig, axes = plt.subplots(1, 2, figsize = (12, 6))
# Find the maximum frequency to set the same y-axis limit for both subplots
# Check if either DataFrame is empty to avoid NaN in max_frequency
max_freq_present = present_data['update_hour'].value_counts().max() if not present_data.empty else 0
max_freq_unoccupied = unoccupied_data['update_hour'].value_counts().max() if not unoccupied_data.empty else 0
max_frequency = max(max_freq_present, max_freq_unoccupied)
# Plot histogram for 'Present' state if data is available
if not present_data.empty:
axes[0].hist(present_data['update_hour'], bins = 24, edgecolor = 'black', range = (0, 23))
axes[0].set_title('Distribution of Update Times for Present Spaces')
axes[0].set_xlabel('Hour of the Day')
axes[0].set_ylabel('Frequency')
axes[0].set_ylim([0, max_frequency + 1]) # +1 to ensure visibility if max_frequency is 0
else:
axes[0].text(0.5, 0.5, 'No data for Present Spaces', horizontalalignment = 'center', verticalalignment = 'center')
axes[0].set_title('Distribution of Update Times for Present Spaces')
axes[0].set_xlabel('Hour of the Day')
axes[0].set_ylabel('Frequency')
# Plot histogram for 'Unoccupied' state if data is available
if not unoccupied_data.empty:
axes[1].hist(unoccupied_data['update_hour'], bins = 24, edgecolor = 'black', range = (0, 23))
axes[1].set_title('Distribution of Update Times for Unoccupied Spaces')
axes[1].set_xlabel('Hour of the Day')
axes[1].set_ylabel('Frequency')
axes[1].set_ylim([0, max_frequency + 1]) # +1 to ensure visibility if max_frequency is 0
else:
axes[1].text(0.5, 0.5, 'No data for Unoccupied Spaces', horizontalalignment = 'center', verticalalignment = 'center')
axes[1].set_title('Distribution of Update Times for Unoccupied Spaces')
axes[1].set_xlabel('Hour of the Day')
axes[1].set_ylabel('Frequency')
# Show the plots
plt.tight_layout()
plt.show()
Find parking restrictions¶
A parking space consists of three parts: parking space, parking space description, and sensors. Different departments manage the three and need to call different data sets. By comparing the parking space sensor and the bay's geographical location, the bayid of each geographical location is obtained. Then, get the description of the parking space through bayid.
Get the kerbside id of the parking space¶
Fetch the whole dataset from City of Melbourne Open Portal.
# export endpoint
# https://data.melbourne.vic.gov.au/explore/dataset/on-street-parking-bays/table/
dataset2_id = 'on-street-parking-bays'
base_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = api_key
dataset2_id = dataset2_id
format = 'csv'
url = f'{base_URL}{dataset2_id}/exports/{format}'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
# 'api_key': apikey
}
# GET request
response = requests.get(url, params = params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
bay_df = pd.read_csv(StringIO(url_content), delimiter = ';')
print(bay_df.sample(2, random_state = 999)) # test
else:
print(f'Request failed with status code {response.status_code}')
roadsegmentid kerbsideid \
153 20992 9609
9672 22090 NaN
roadsegmentdescription latitude \
153 Errol Street between Victoria Street and Queen... -37.803762
9672 Wadey Street between St Kilda Road and Wells S... -37.826066
longitude lastupdated
153 144.949407 2022-08-31
9672 144.970160 2022-08-31
# check length in dataset
print(f'The dataset contains {len(bay_df)} records.')
The dataset contains 19162 records.
Integrate and add road segment id and description to parking sensor dataframe
exclude = ['latitude', 'longitude', 'lastupdated']
bay_selected = bay_df.drop(exclude, axis = 1)
# filtered records to include non-numeric values only
filtered_bay = bay_selected[bay_selected['kerbsideid'].str.isnumeric() != False].copy()
filtered_bay['kerbsideid'] = pd.to_numeric(
filtered_bay['kerbsideid'], errors = 'coerce').astype('Int64')
# Perform a left merge
merged_df = pd.merge(df_analysis, filtered_bay, on = ['kerbsideid'], how = 'left')
print(merged_df.head(2))
lastupdated status_timestamp zone_number \ 0 2023-10-25 01:44:02 2023-10-25 12:07:05+11:00 7556.0 1 2023-10-25 01:44:02 2023-10-25 11:57:43+11:00 7556.0 status_description kerbsideid latitude longitude status_date \ 0 Present 5730 -37.810582 144.968053 2023-10-25 1 Present 5728 -37.810566 144.968109 2023-10-25 roadsegmentid roadsegmentdescription 0 20003.0 Lonsdale Street between Exhibition Street and ... 1 20003.0 Lonsdale Street between Exhibition Street and ...
merged_df.shape
(6158, 10)
# Check duplicated rows based on specific columns
kerb_dup = merged_df.duplicated(subset = ['kerbsideid'], keep = False)
# Display the duplicated rows based on 'kerbsideid'
merged_df[kerb_dup]
| lastupdated | status_timestamp | zone_number | status_description | kerbsideid | latitude | longitude | status_date | roadsegmentid | roadsegmentdescription | |
|---|---|---|---|---|---|---|---|---|---|---|
| 416 | 2023-12-05 00:44:41 | 2023-12-05 09:15:56+11:00 | 7644.0 | Present | 18002 | -37.817446 | 144.953536 | 2023-12-05 | 20073.0 | Spencer Street between Little Collins Street a... |
| 417 | 2023-12-05 00:44:41 | 2023-12-05 09:15:56+11:00 | 7644.0 | Present | 18002 | -37.817446 | 144.953536 | 2023-12-05 | 20073.0 | Spencer Street between Little Collins Street a... |
| 418 | 2023-12-05 00:44:41 | 2023-10-11 20:27:19+11:00 | 7333.0 | Present | 17806 | -37.815782 | 144.961823 | 2023-10-11 | 20099.0 | Little Collins Street between Elizabeth Street... |
| 419 | 2023-12-05 00:44:41 | 2023-10-11 20:27:19+11:00 | 7333.0 | Present | 17806 | -37.815782 | 144.961823 | 2023-10-11 | 20099.0 | Little Collins Street between Elizabeth Street... |
| 423 | 2023-12-05 00:44:41 | 2023-12-04 05:16:37+11:00 | 7333.0 | Present | 17807 | -37.815764 | 144.961886 | 2023-12-04 | 20099.0 | Little Collins Street between Elizabeth Street... |
| 424 | 2023-12-05 00:44:41 | 2023-12-04 05:16:37+11:00 | 7333.0 | Present | 17807 | -37.815764 | 144.961886 | 2023-12-04 | 20099.0 | Little Collins Street between Elizabeth Street... |
| 5924 | 2024-05-11 03:30:26 | 2024-05-11 11:31:01+10:00 | 7269.0 | Present | 24986 | -37.811248 | 144.975538 | 2024-05-11 | 21852.0 | Parliament Place between St Andrews Place and ... |
| 5925 | 2024-05-11 03:30:26 | 2024-05-11 11:31:01+10:00 | 7269.0 | Present | 24986 | -37.811248 | 144.975538 | 2024-05-11 | 20073.0 | Spencer Street between Little Collins Street a... |
# Drop duplicated rows based on 'kerbsideid'
merged_df = merged_df.drop_duplicates(subset=['kerbsideid'], keep='first')
print(merged_df.shape)
(6154, 10)
Find out how many empty kerbsideid there are in total.
# Count the number of NaN values in the 'kerbsideid' column
num_nan = merged_df['kerbsideid'].isna().sum()
print(f"There are {num_nan} NaN values in the 'kerbsideid' column.")
There are 0 NaN values in the 'kerbsideid' column.
Find out how many empty kerbsideid there are in today’s data
# Filter to only include today's data
todays_data = merged_df[merged_df['status_date'] == today_melbourne]
# Count the number of NaN values in the 'kerbsideid' column for today's data
num_nan_today = todays_data['kerbsideid'].isna().sum()
print(f"There are {num_nan_today} NaN values in the 'kerbsideid' column for today's data.")
There are 0 NaN values in the 'kerbsideid' column for today's data.
Get parking space description¶
# export endpoint
# https://data.melbourne.vic.gov.au/explore/dataset/on-street-car-park-bay-restrictions/table/
dataset3_id = 'on-street-car-park-bay-restrictions'
base_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# apikey = api_key
dataset3_id = dataset3_id
format = 'csv'
url = f'{base_URL}{dataset3_id}/exports/{format}'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
# 'api_key': apikey
}
# GET request
response = requests.get(url, params = params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
restrictions = pd.read_csv(StringIO(url_content), delimiter = ';')
print(restrictions.sample(2, random_state = 999)) # test
else:
print(f'Request failed with status code {response.status_code}')
bayid deviceid description1 description2 \
540 6151 30237 2P MTR M-SAT 7:30-18:30 2P SUN 7:30-18:30
2353 6759 23157 2P MTR M-SAT 7:30-18:30 2P SUN 7:30-18:30
description3 description4 description5 description6 disabilityext1 \
540 NaN NaN NaN NaN 240
2353 NaN NaN NaN NaN 240
disabilityext2 ... today3 today4 today5 today6 typedesc1 \
540 240.0 ... NaN NaN NaN NaN 2P Meter
2353 240.0 ... NaN NaN NaN NaN 2P Meter
typedesc2 typedesc3 typedesc4 typedesc5 typedesc6
540 2P NaN NaN NaN NaN
2353 2P NaN NaN NaN NaN
[2 rows x 62 columns]
# check length in dataset 3
print(f'The dataset contains {len(restrictions)} records.')
The dataset contains 4263 records.
Get the columns description1-5 about the parking space restriction description. After online inspection on On-street Car Park Bay Restrictions, although description6 is marked as a time limit description, it is actually marked as January 1, AD1. There is no data description at the same time, so it is excluded manually.
# rename the 'bayid' column as'kerbsideid'
restrictions.rename(columns = {'bayid': 'kerbsideid'}, inplace = True)
# Merge merged_df with only the relevant columns from restrictions
columns_to_merge = ['kerbsideid', 'description1', 'description2',
'description3', 'description4', 'description5']
merged_df = pd.merge(
merged_df,
restrictions[columns_to_merge],
left_on = 'kerbsideid',
right_on = 'kerbsideid',
how = 'left')
print(merged_df.head(2))
lastupdated status_timestamp zone_number \
0 2023-10-25 01:44:02 2023-10-25 12:07:05+11:00 7556.0
1 2023-10-25 01:44:02 2023-10-25 11:57:43+11:00 7556.0
status_description kerbsideid latitude longitude status_date \
0 Present 5730 -37.810582 144.968053 2023-10-25
1 Present 5728 -37.810566 144.968109 2023-10-25
roadsegmentid roadsegmentdescription \
0 20003.0 Lonsdale Street between Exhibition Street and ...
1 20003.0 Lonsdale Street between Exhibition Street and ...
description1 description2 description3 description4 \
0 2P MTR M-SAT 7:30-18:30 2P SUN 7:30-18:30 NaN NaN
1 NaN NaN NaN NaN
description5
0 NaN
1 NaN
Add time limits to each parking space
# Columns to check for "DIS"
columns_to_check = ['description1', 'description2', 'description3', 'description4', 'description5']
# Find rows where any of the specified columns contain "DIS"
contains_dis = merged_df[columns_to_check].apply(lambda x: x.str.contains('DIS')).any(axis = 1)
# Extract rows with "DIS" into a separate DataFrame
disabled_parking_df = merged_df[contains_dis]
# Delete rows with "DIS" from the original DataFrame
merged_df = merged_df[~contains_dis]
# Display the new DataFrames
print("Disabled Parking Data:")
print(disabled_parking_df.head())
print("Remaining Data:")
print(merged_df.head())
Disabled Parking Data:
lastupdated status_timestamp zone_number \
31 2023-10-25 01:44:02 2023-10-25 12:03:28+11:00 7572.0
35 2023-10-25 01:44:02 2023-10-25 12:23:52+11:00 7572.0
55 2023-11-22 00:44:42 2023-11-22 11:07:08+11:00 NaN
122 2023-12-05 00:44:41 2023-10-25 11:19:04+11:00 7570.0
196 2023-10-25 01:44:02 2023-10-25 12:40:25+11:00 7572.0
status_description kerbsideid latitude longitude status_date \
31 Present 6081 -37.809592 144.971447 2023-10-25
35 Present 6087 -37.809536 144.971637 2023-10-25
55 Present 6237 -37.817866 144.958069 2023-11-22
122 Present 6035 -37.809817 144.970344 2023-10-25
196 Unoccupied 6080 -37.809599 144.971422 2023-10-25
roadsegmentid roadsegmentdescription \
31 20001.0 Lonsdale Street between Spring Street and Exhi...
35 20001.0 Lonsdale Street between Spring Street and Exhi...
55 20118.0 Collins Street between William Street and King...
122 20001.0 Lonsdale Street between Spring Street and Exhi...
196 20001.0 Lonsdale Street between Spring Street and Exhi...
description1 description2 \
31 2P DIS M-SAT 7:30-18:30 NaN
35 P DIS Sun 00:00 - 11:59PM P DIS M-Sat 6.30-11.59
55 2P DIS M-SUN 0:00-23:59 NaN
122 2P DIS M-SAT 7:30-18:30 NaN
196 2P DIS M-SAT 7:30-18:30 NaN
description3 description4 description5
31 NaN NaN NaN
35 2P DIS M-SAT 7:30-18:30 NaN NaN
55 NaN NaN NaN
122 NaN NaN NaN
196 NaN NaN NaN
Remaining Data:
lastupdated status_timestamp zone_number \
0 2023-10-25 01:44:02 2023-10-25 12:07:05+11:00 7556.0
1 2023-10-25 01:44:02 2023-10-25 11:57:43+11:00 7556.0
2 2023-10-25 01:44:02 2023-10-25 12:12:46+11:00 7556.0
3 2023-10-25 01:44:02 2023-10-25 06:31:04+11:00 7556.0
4 2023-10-25 01:44:02 2023-10-25 10:56:41+11:00 7556.0
status_description kerbsideid latitude longitude status_date \
0 Present 5730 -37.810582 144.968053 2023-10-25
1 Present 5728 -37.810566 144.968109 2023-10-25
2 Present 5750 -37.810558 144.968137 2023-10-25
3 Present 5743 -37.810321 144.968947 2023-10-25
4 Present 5749 -37.810286 144.969062 2023-10-25
roadsegmentid roadsegmentdescription \
0 20003.0 Lonsdale Street between Exhibition Street and ...
1 20003.0 Lonsdale Street between Exhibition Street and ...
2 20003.0 Lonsdale Street between Exhibition Street and ...
3 20003.0 Lonsdale Street between Exhibition Street and ...
4 20003.0 Lonsdale Street between Exhibition Street and ...
description1 description2 description3 description4 \
0 2P MTR M-SAT 7:30-18:30 2P SUN 7:30-18:30 NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 1P MTR M-SAT 7:30-18:30 1P SUN 7:30-18:30 NaN NaN
4 2P MTR M-SAT 7:30-18:30 2P SUN 7:30-18:30 NaN NaN
description5
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Time limit to check how many parking spaces are empty.
# Columns to check for NaN values
columns_to_check = ['description1', 'description2', 'description3', 'description4', 'description5']
# Find rows where all specified columns are NaN
all_nan_rows = merged_df[columns_to_check].isna().all(axis = 1)
# Count the number of such rows
num_all_nan_rows = all_nan_rows.sum()
# Display the count
print(f"Number of rows where all of the specified columns are NaN: {num_all_nan_rows}")
Number of rows where all of the specified columns are NaN: 5481
Conclusion¶
We have integrated bay_id and parking space restrictions for each parking space. The existence number is unchanged.
Through inspection, it can be found that the time limits of the parking spaces where most parking space sensors are located have not been integrated into the database. The lack of parking spaces is inconsistent with the reality.
Three different systems or departments provide the three databases of parking space sensors, parking spaces, and parking restrictions. Missing data can be encountered. We will assume parking time restrictions.
Today's parking space analysis¶
Due to the no parking space time limit, we will assume that the parking space time limit is 2 hours. The average parking time is 1 hour. Because the parking space sensor provides a snapshot, the sensor only records the most recent state. Pushing the collected empty parking spaces forward one hour is the assumed parking start time. Can analyze the theoretical last parking peak period.
Time processing¶
# Step 1: Create a copy of today's data
todays_data_copy = todays_data.copy()
# Step 2: Filter to only include rows where the parking space is vacant
vacant_indices = todays_data_copy[todays_data_copy['status_description'] == 'Unoccupied'].index
# Step 3: Subtract 60 minutes from the timestamp of these vacant spaces
todays_data_copy.loc[vacant_indices, 'status_timestamp'] = todays_data_copy.loc[
vacant_indices, 'status_timestamp'] - Minute(60)
# Step 4: Update the 'status_date' column based on the new 'status_timestamp'
todays_data_copy['status_date'] = todays_data_copy['status_timestamp'].dt.date
# Step 4: Change the status of these advanced-time vacant spaces to 'Occupied'
todays_data_copy.loc[vacant_indices, 'status_description'] = 'Occupied'
# Step 5: Remove rows where the 'status_date' is not today
todays_data_copy = todays_data_copy[todays_data_copy['status_date'] == today_melbourne]
Popular parking hours¶
# Convert 'status_timestamp' to just the hour of the day
todays_data_copy['status_hour'] = todays_data_copy['status_timestamp'].dt.hour
# Create a frequency table for the update times
frequency_table = todays_data_copy['status_hour'].value_counts().sort_index()
# Plot a histogram for the update times
plt.figure(figsize = (12, 6))
plt.bar(frequency_table.index, frequency_table.values, color = 'blue', alpha = 0.7, label = 'Frequency')
plt.xlabel('Hour of the Day')
plt.ylabel('Frequency')
plt.title('Temporal Distribution of Parking Space Occupancy')
plt.xticks(range(0, 24))
plt.grid(True)
plt.legend()
plt.show()
If time is even, it can be considered that the parking peak is still in progress. If it has started to decline, it can be considered that the previous peak has passed.
Parking hotspot visualization¶
Today's parking spaces are clustered and marked on the map.
# Create a new DataFrame with only the latitude and longitude columns
X = todays_data_copy[['latitude', 'longitude']].values
# Run DBSCAN clustering
dbscan = DBSCAN(eps = 0.0005, min_samples = 20)
labels = dbscan.fit_predict(X)
# Add cluster labels back to DataFrame
todays_data_copy['cluster_label'] = labels
# Calculate the centroid of each cluster
cluster_centroids = todays_data_copy.groupby('cluster_label').agg(
{'latitude': 'mean', 'longitude': 'mean'}).reset_index()
# Initialize the map
m = folium.Map(location = [-37.814, 144.963], zoom_start = 16)
# Add circles for each cluster
for idx, row in cluster_centroids.iterrows():
if row['cluster_label'] != -1: # -1 is the label for noise points in DBSCAN
folium.Circle(
location = [row['latitude'], row['longitude']],
radius = 50, # Radius
color = 'blue',
fill = True,
fill_color = 'blue'
).add_to(m)
# Show the map
m
Conclusion¶
Multiple analysis methods have been tried so far. Try combining multiple APIs. But, due to the lack of parking space time limits and the snapshot nature of the sensors. We can only analyze theoretical parking hotspots. There is a deviation from the actual situation.
If we need to analyze the possibility of a long time, we need to deploy the server in real time for a long time. It is difficult for ordinary readers to reproduce and operate.
This analysis is intended for data exploration. At the same time, it provides analysis reference for data that cannot be updated and integrated. Guide readers to realize the importance of data screening and processing. Presenting unprocessed data requires repeated analysis and screening. Provide a reference for data processing ideas.